A distance-based classifier built on class model

نویسندگان

  • Dipankar Bachar
  • Rosa Meo
چکیده

In this paper we propose a new type of distance-based classifier. Traditionally, these classifiers are instancebased: they classify a test instance by computation of a similarity measure between that instance and the instances in the training-set and assigning it the same class of the most similar k instances. This method is simple but has some disadvantages, among which there is the greater sensitivity to the local presence of noise in the training-set, the need to compute many similarity measures and the difficulty to decide the value of k. In our approach the classifier is model-based: it first constructs a probabilistic model for each class based on a selection of the itemsets, weighted by their probability to appear in the training data instances of the given class. Then it uses these class models to predict the class of test instances by calculating the distance between the test instance and each of the class model. We have experimented with four different proximity measures and different metrics of itemsets selection on a large collection of datasets from UCI archive. We show that our methods has many benefits. It reduces the number of distance computations. It improves classification accuracy of state-of-the art classifiers, like decision trees, SVM, k-nn, Naive Bayes, rulebased classifiers and association rule-based ones. Finally, it is less sensitive to overfitting and to noise. We perform also an automatic tuning of the algorithmic parameters like the itemsets frequency threshold and the itemsets selection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fault diagnosis in a distillation column using a support vector machine based classifier

Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...

متن کامل

Application of Artificial Neural Network in Landscape Change Process in Gharesou Watershed, Golestan Province

Land use change is certainly the most important factor that affects the conservation of natural ecosystems, resulting the conversion of natural lands such as forests and pastures into agricultural, industrial and urban areas. Despite numerous studies investigating landscape patterns due to land use change, the driving forces of landscape change has been less studied in Iran. In this study, Arti...

متن کامل

روشی جدید برای عضویت‌دهی به داده‌ها و شناسایی نوفه و داده‌های پرت با استفاده از ماشین بردار پشتیبان فازی

Support Vector Machine (SVM) is one of the important classification techniques, has been recently attracted by many of the researchers. However, there are some limitations for this approach. Determining the hyperplane that distinguishes classes with the maximum margin and calculating the position of each point (train data) in SVM linear classifier can be interpreted as computing a data membersh...

متن کامل

2D Dimensionality Reduction Methods without Loss

In this paper, several two-dimensional extensions of principal component analysis (PCA) and linear discriminant analysis (LDA) techniques has been applied in a lossless dimensionality reduction framework, for face recognition application. In this framework, the benefits of dimensionality reduction were used to improve the performance of its predictive model, which was a support vector machine (...

متن کامل

Robustified distance based fuzzy membership function for support vector machine classification

Fuzzification of support vector machine has been utilized to deal with outlier and noise problem. This importance is achieved, by the means of fuzzy membership function, which is generally built based on the distance of the points to the class centroid. The focus of this research is twofold. Firstly, by taking the advantage of robust statistics in the fuzzy SVM, more emphasis on reducing the im...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008